Goto

Collaborating Authors

 conv 1





Implémentation Efficiente de Fonctions de Convolution sur FPGA à l'Aide de Blocs Paramétrables et d'Approximations Polynomiales

Magalhães, Philippe, Fresse, Virginie, Suffran, Benoît, Alata, Olivier

arXiv.org Artificial Intelligence

Implementing convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs) has emerged as a promising alternative to GPUs, offering lower latency, greater power efficiency and greater flexibility. However, this development remains complex due to the hardware knowledge required and the long synthesis, placement and routing stages, which slow down design cycles and prevent rapid exploration of network configurations, making resource optimisation under severe constraints particularly challenging. This paper proposes a library of configurable convolution Blocks designed to optimize FPGA implementation and adapt to available resources. It also presents a methodological framework for developing mathematical models that predict FPGA resources utilization. The approach is validated by analyzing the correlation between the parameters, followed by error metrics. The results show that the designed blocks enable adaptation of convolution layers to hardware constraints, and that the models accurately predict resource consumption, providing a useful tool for FPGA selection and optimized CNN deployment.



Convolutions Through the Lens of Tensor Networks

Dangel, Felix

arXiv.org Artificial Intelligence

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas. We provide a new perspective onto convolutions through tensor networks (TNs) which allow reasoning about the underlying tensor multiplications by drawing diagrams, and manipulating them to perform function transformations, sub-tensor access, and fusion. We demonstrate this expressive power by deriving the diagrams of various autodiff operations and popular approximations of second-order information with full hyper-parameter support, batching, channel groups, and generalization to arbitrary convolution dimensions. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to re-wire and simplify diagrams before evaluation. Finally, we probe computational performance, relying on established machinery for efficient TN contraction. Our TN implementation speeds up a recently-proposed KFAC variant up to 4.5 x and enables new hardware-efficient tensor dropout for approximate backpropagation. Despite the success of transformers [68], CNNs continue to be widely used and show competitive performance when incorporating architecture modernizations [41; 40] and attention [30; 70; 11; 39]. While the intuition behind convolution is simple to understand with graphical illustrations such as in Dumoulin & Visin [20], convolutions are more challenging to analyze than fully-connected layers in multi-layer perceptrons (MLPs). One reason is that the operation is hard to express in matrix expressions and--even when switching to index notation--compact expressions that are convenient to work with only exist for special hyper-parameter choices [e.g. The many hyper-parameters of convolution and additional features like channel groups [35] introduce additional complexity. And related objects like (higher-order) derivatives and related routines for autodiff inherit this complexity. TNs express tensor multiplications as diagrams (Figure 1).


Sharp Eyes: A Salient Object Detector Working The Same Way as Human Visual Characteristics

Zhu, Ge, Li, Jinbao, Guo, Yahong

arXiv.org Artificial Intelligence

Current methods aggregate multi-level features or introduce edge and skeleton to get more refined saliency maps. However, little attention is paid to how to obtain the complete salient object in cluttered background, where the targets are usually similar in color and texture to the background. To handle this complex scene, we propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it, which is in line with human visual characteristics, i.e., to look first and then focus. Different from previous methods which directly integrate edge or skeleton to supplement the defects of objects, the proposed method aims to utilize the expanded objects to guide the network obtain complete prediction. Specifically, SENet mainly consists of target separation (TS) brach and object segmentation (OS) branch trained by minimizing a new hierarchical difference aware (HDA) loss. In the TS branch, we construct a fractal structure to produce saliency features with expanded boundary via the supervision of expanded ground truth, which can enlarge the detail difference between foreground and background. In the OS branch, we first aggregate multi-level features to adaptively select complementary components, and then feed the saliency features with expanded boundary into aggregated features to guide the network obtain complete prediction. Moreover, we propose the HDA loss to further improve the structural integrity and local details of the salient objects, which assigns weight to each pixel according to its distance from the boundary hierarchically. Hard pixels with similar appearance in border region will be given more attention hierarchically to emphasize their importance in completeness prediction. Comprehensive experimental results on five datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.


Deep Multimodal Subspace Clustering Networks

Abavisani, Mahdi, Patel, Vishal M.

arXiv.org Machine Learning

Abstract--We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder . The encoder takes multimodal data as input and fuses them to a latent space representation. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressiveness layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods. ANY practical applications in image processing, computer vision, and speech processing require one to process very high-dimensional data. However, these data often lie in a low-dimensional subspace. For instance, facial images with variation in illumination [1], handwritten digits [2] and trajectories of a rigidly moving object in a video [3] are examples where the high-dimensional data can be represented by low-dimensional subspaces. Subspace clustering algorithms essentially use this fact to find clusters in different subspaces within a dataset [4].


Do not Choose Representation just Change: An Experimental Study in States based EA

Bercachi, Maroun, Collard, Philippe, Clergue, Manuel, Verel, Sebastien

arXiv.org Artificial Intelligence

Our aim in this paper is to analyse the phenotypic effects (evolvability) of diverse coding conversion operators in an instance of the states based evolutionary algorithm (SEA). Since the representation of solutions or the selection of the best encoding during the optimization process has been proved to be very important for the efficiency of evolutionary algorithms (EAs), we will discuss a strategy of coupling more than one representation and different procedures of conversion from one coding to another during the search. Elsewhere, some EAs try to use multiple representations (SM-GA, SEA, etc.) in intention to benefit from the characteristics of each of them. In spite of those results, this paper shows that the change of the representation is also a crucial approach to take into consideration while attempting to increase the performances of such EAs. As a demonstrative example, we use a two states SEA (2-SEA) which has two identical search spaces but different coding conversion operators. The results show that the way of changing from one coding to another and not only the choice of the best representation nor the representation itself is very advantageous and must be taken into account in order to well-desing and improve EAs execution.